首页> 外文OA文献 >Gambler's Ruin Bandit Problem
【2h】

Gambler's Ruin Bandit Problem

机译:赌徒的废墟强盗问题

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we propose a new multi-armed bandit problem called theGambler's Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in asequence of rounds, where each round is a Markov Decision Process (MDP) withtwo actions (arms): a continuation action that moves the learner randomly overthe state space around the current state; and a terminal action that moves thelearner directly into one of the two terminal states (goal and dead-end state).The current round ends when a terminal state is reached, and the learner incursa positive reward only when the goal state is reached. The objective of thelearner is to maximize its long-term reward (expected number of times the goalstate is reached), without having any prior knowledge on the state transitionprobabilities. We first prove a result on the form of the optimal policy forthe GRBP. Then, we define the regret of the learner with respect to anomnipotent oracle, which acts optimally in each round, and prove that itincreases logarithmically over rounds. We also identify a condition under whichthe learner's regret is bounded. A potential application of the GRBP is optimalmedical treatment assignment, in which the continuation action corresponds to aconservative treatment and the terminal action corresponds to a risky treatmentsuch as surgery.
机译:在本文中,我们提出了一个新的多武装强盗问题,称为The Gambler's Ruin Bandit Problem(GRBP)。在GRBP中,学习者按轮次顺序进行,其中每轮是具有两个动作(手臂)的马尔可夫决策过程(MDP):一种连续动作,使学习者在当前状态周围的状态空间上随机移动;本轮回合在达到终极状态时结束,学习者仅在达到目标状态时才获得正向奖励。学习者的目标是最大化其长期奖励(达到目标状态的预期次数),而无需事先了解状态转换概率。我们首先以GRBP最优策略的形式证明结果。然后,我们定义了学习者对于全能神谕的遗憾,后者在每个回合中表现最佳,并证明其在回合中呈对数增长。我们还确定了限制学习者后悔的条件。 GRBP的潜在应用是最佳治疗方案,其中持续作用对应于保守治疗,而终止作用对应于危险的治疗,例如手术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号